Suffix Stripping Problem as an Optimization Problem

نویسندگان

  • B. P. Pande
  • Pawan Tamta
  • H. S. Dhami
چکیده

Stemming or suffix stripping, an important part of the modern Information Retrieval systems, is to find the root word (stem) out of a given cluster of words. Existing algorithms targeting this problem have been developed in a haphazard manner. In this work, we model this problem as an optimization problem. An Integer Program is being developed to overcome the shortcomings of the existing approaches. The sample results of the proposed method are also being compared with an established technique in the field for English language. An AMPL code for the same IP has also been given.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language

Ants algorithm is a universal and flexible solution which was first designed for solving optimization problem such as Traveling Salesman Problem. Analogy between finding the shortest way by ants and finding documents most alike, became a stimulus of ant based text document clustering method. This method consist of two phases, which are finding documents most alike (trial phase) and clusters mak...

متن کامل

A Continuous Optimization Model for Partial Digest Problem

The pupose of this paper is modeling of Partial Digest Problem (PDP) as a mathematical programming problem. In this paper we present a new viewpoint of PDP. We formulate the PDP as a continuous optimization problem and develope a method to solve this problem. Finally we constract a linear programming model for the problem with an additional constraint. This later model can be solved by the simp...

متن کامل

Bi-objective optimization of multi-server intermodal hub-location-allocation problem in congested systems: modeling and solution

A new multi-objective intermodal hub-location-allocation problem is modeled in this paper in which both the origin and the destination hub facilities are modeled as an M/M/m queuing system. The problem is being formulated as a constrained bi-objective optimization model to minimize the total costs as well as minimizing the total system time. A small-size problem is solved on the GAMS software t...

متن کامل

Overlay Problems for Music and Combinatorics

Motivated by the identification of the musical structure of pop songs, we introduce combinatorial problems involving overlays (non-overlapping substrings) and the covering of a text t by them. We present 4 problems and suggest solutions based on string pattern matching techniques. We show that decision problems of this type can be solved using an Aho-Corasick keyword automaton. We conjecture th...

متن کامل

Anunsupervised Approach Todevelop Stemmer

This paper presents an unsupervised approach for the development of a stemmer (For the case of Urdu & Marathi language). Especially, during last few years, a wide range of information in Indian regional languages has been made available on web in the form of e-data. But the access to these data repositories is very low because the efficient search engines/retrieval systems supporting these lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1312.6802  شماره 

صفحات  -

تاریخ انتشار 2013